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1.  Introduction 

In  many  real  situation*,  observations  are  available  only  in  certain 
groups.  This  process  of  recording  or  storing  observations  in  groups  directly 
leads  to  grouped  data.  Experimental  observations  where  precise  measuring 
instruments  are  not  available  also  result  in  grouped  data.  Various  authors 
have  provided  examples  of  diverse  fields  where  statistical  analysis  has  to 
depend  on  grouped  data,  for  reference,  see  Indrayan  and  Rustagi  (1979)-  In 
that  paper,  the  case  of  approximate  maximum  likelihood  estimates  was  discussed 
for  regression  models.  In  this  paper,  we  provide  techniques  for  testing  hypo- 
theses about  parameters  in  the  regression  model  under  the  situation  of  grouped 
data.  We  consider  a teat  statistic  which  is  similar  to  the  conventional  F 
statistic  for  the  ungrouped  case.  A simulation  study,  performed  for  a few 

cases,  shows  that  the  proposed  statistic  has  an  approximate  F-distribution, 

h , 

The  approximation  here  is  to  the  order  0(ft  /a  ) where  h is  the  length 
of  the  interval  of  recorded  observations  and  a is  the  standard  deviation. 

In  a way,  this  simulation  study  confirms  the  robustness  of  the  F-statistic  for 
the  regression  models  in  the  grouped  case  and  is  likely  to  be  very  useful  in 
applications. 


2.  Model  and  Notation 


The  notation  used  here  is  the  same  as  in  Indrayan  and  Rustagi  (1979) • 
It  ia  assumed  that  there  are  K populations,  generated  by  random  variables 
yl,^2*  ‘ ‘ * ,yk‘  let  y^  be  normally  distributed  with  mean  and 

variance  a2,  where 


u.  » &.,x, . + &„x.  _+...+  p x.  , 
k 1 kl  2 k2  p kp 


1,  2, . . . ,K  , 


(2.1) 


0^'s  are  the  unknown  parameters  and  x's  are  known  constants.  It  is  assumed 


that  there  are  nfc  independent  observations  in  y^,  denoted  as  y^,  u <*  1,2,. . . ,n^ 
k 

The  matrix  of  observations  denoted  by 


with  Z n «*  n 
k-1  K 


l-  (»io> 

i * 1,2, ... ,n  , 
i - i>2,. . . ,p  , 


is  known. 

Suppose  the  possible  values  of  the  random  variables  are  recorded  in 
intervals  C^)  , i ■ ...  -2,  -1,  0,  1,  2,...  with  ^ = h.  Let 

be  the  number  of  observations  on  Y^  in  the  interval  [C^  C^)  and  let 

this  probability  be  H . 

Let  C be  a matrix  of  order  m X p of  known  constants  and  let 
■ (p^,...,0p).  We  are  interested  in  the  test  of  the  hypothesis 


v a, -a, 


(2.2) 


versus  the  alternative 


2 


Hr  SSi*  ft, 


for  some  given  constant  vector 

In  the  usual  ungrouped  case,  where  K populations  are  tested  for  the 
above  hypothesis,  the  analysis  of  variance  test  utilizing  the  F-distrlbution 
is  generally  available  in  most  books,  see  for  example,  Rao  ( 1973 1 . 

For  the  grouped  data  case,  the  likelihood  of  the  sample  L(JT)  is  obtained 
in  terms  of  multinomials. 

Let  the  maximum  likelihood  estimates  of  IT  under  the  null  hypothesis 

A A 

be  denoted  by  TT^w)  and  under  the  full  model  by  ^^(0)*  Maximum  likelihood 
estimates  were  discussed  in  an  earlier  paper  by  the  authors  (1979).  The 
likelihood  ratio  given  by 


X 


^ik^) 

L(irtt(ft)) 


(2.3) 


is  used  to  provide  tests  for  the  hypotheses  (2.2).  The  asymptotic  distribution 

of  -2  log  X follows  chi-squared  distribution  as  in  the  ungrouped  case  under 

the  approximation  ignoring  terms  of  0(\)  . 

a 


3 


3.  F -statistic 


In  the  usual  ungrouped  case,  the  test  of  the  general  linear  hypothesis 
is  obtained  In  terns  of  an  F test.  For  the  sake  of  completeness,  we  state 
below  the  Fundamental  Lemma  of  Analysis  of  Variance,  Rao  (1973). 


2 

Lensna  3.1.  For  the  Gauss -Markov  model  (Y,  X0,  a I),  the  test  of  hypothesis 
Cg^  » a^ where  C is  a given  m X p matrix  of  rank  m,  is  given  by  the  statistic 


F - 


(3-D 


where  r * rank  of  matrix  ^ with 

R02  - .in  <£-&>■  (JL-Sa)  , 
& 

and 


a2  -.in  <£*&)' . 

SB?*. 

The  statistic  (3.1)  has  an  F-distribution  with  p and  n-r  degrees  of  freedom. 

The  test  of  hypothesis  in  the  grouped-data  case  can  be  similarly  obtain- 
ed in  terms  of  a statistic  which  1b  the  ratio  of  sums  of  squares.  Let  the 
mid  point  variable  M be  defined  by  the  following: 


M « if  and  only  if  Y c [C^  , C^) 

i - . . . -2,  -1,  0,  1,  2, . . . 


The  approximate  maximum  likelihood  estimate  of  0 is  given  by 


Bo  " 


(3-2) 


k 


where  A"  denotes  the  generalized  inverse  (g-inverse)  of  the  matrix  A, 
for  reference,  see  Rao  and  Mitre  (1971)  and  is  the  vector  of  mid  points 

A 

resulting  from  the  data.  The  mid  point  estimator  leads  to  a statistic 


2 2 2 

R1M  - rom  • Ro« 


m 


n - r 


(3-3) 


where  RQM  - min  (M^ - w (MO&)  , and 

& 

- “in  . 

Cp-a 

As  uaual,  we  evaluate  the  sums  of  squares  in  the  statistic  (3-3)  by  using 
the  following  notation. 

Suppose  ("*)-(**■  , 

**  z % V 

then 


"OM 


I'M 


(3M 


(3.5) 


To  ensure  estimability  of  we  assume  that  £ ■ A^X_  for  acme  A,  with  ranK 
of  C ■ m. 

Without  loss  of  generality,  the  statistic  FQ  can  be  considered  for  the 
case  of  testing  the  hypothesis  that  ^ * 0.  In  that  case  we  have 

^ * Bin 

B, 

- (*-&,>' (&-%> 

- • 
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and  Rii<  - 80  that  Rw  ' rcm  " Bo  U8ually  rcm  la  called  the 

2 2 

Error  Sum  of  Squares  (Error  SS)  and  - R^  as  the  Regression  S»"«»  of 
Squares  (Regression  SS). 

To  show  that  the  asymptotic  distribution  of 

t _ (Regression  SS)2  . (Error  SS)2  , 

0 o * n-*  (3-6’ 

is  chi-squared  to  the  order  of  approximation  implied  by  Sheppard' b cor- 
rections, Cramer  (1971*,  pp.  359-362)  which  is  assumed  to  be  negligible  here, 
we  have  the  following  lemmas . 


Lenina  3.2.  The  asymptotic  distribution  cf 


8_  - (X'X)‘  X'M 

*0  **  N A. 


(3.7' 


as  n^  ■*  • is  p-variate  normal  with  mean  and  variance  covariance 


(X'X)  (a  + ^)  to  the  order  to  the  order  of  approximation  implied  by  Sheppard's 
correction. 


Proof:  Let  M 


fe 

\i 


where  is  n^-vector,  k - 1,2,...,K.  Suppose  (fc/xT  XJ  - B - (§^,...,3^) 
with  being  a p X nt  matrix,  i - 1,2,...  ,K.  With  the  above  definitions. 


Bo  Sc  Sc 


(3.8) 


Let  the  elements  of  the  partitions  of  the  matrix  ^ be  denoted  by 
&to(i)  > t - 1,2, ... ,p  , m ■ 1,2,. . . ,n^  . Notice  that  the  first  n^  columns 
of  XJ  are  Identical,  the  next  n^  colissns  are  also  identical  and  so  on. 
Therefore,  the  columns  of  the  matrix  B.  are  all  identical  for  i - 1,2,...,K. 
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B.M,  for  each  i reduces  to  a constant  multiple  of  the  sume  of  n. 
elements  of  the  vector  M^.  Since  the  components  of  this  sum  are  indepen- 
dent and  identically  distributed,  by  the  application  of  Central  Limit 
Theorem,  the  vector  18  asymptotically  p-variate  normal  for  each  1. 

The  distribution  of  ^ Is  consequently  flso  p-variate  normal. 

Since  E(M^ 

(subject  to  the  approximation  implied  by  Shappard’s  correction),  it  follovs 
that 

E(5o)  " & (3.9 


E(J^)  and  variance  of  the  components  of  is  a 


12 


and 


Cov(B0) 


The  proof  of  lemma  is  now  complete. 


(3.K 


Lemma  3.3.  To  the  order  of  approximation  implied  by  the  Sheppard  correction 
the  mean  and  variance  of  error  sums  of  squares  are  given  by 


where 


and 


E( Error  SS) 
V( Error  SS) 


(n-p^Q  , 
2(n-m)og  + 


0 


f 


2 

°0 


> 


(3-i: 

(3.3* 


(3.1: 


- 1 - m:i)~  i,'- 


(3*1' 


Proof.  Applying  results  of  theorem  1 of  Searle  (1971,  p.  55)  and  (3.10',  we 


have 


E( Error  SS) 


2 ^ 

naQ  + 


(Ba0  + 


VVW  - (n-®^ac 


In  a paper  by  Hsu  (1938),  it  has  been  shown  that  a quadratic  form 
Q * £ 1 E(Q)  "1  and  Var(Z^)  « a2  for  all  i,  has  the  following 

properties : 

V*r  « ■ E(|.kl-3)  •>,(  - S« 

wlwre  ii^j-  «(Zj  - WZj))"  . 

Suppose  now. 


q _ Error  SS 
(n-p)o^ 

so  that  E(Q)  * 1.  Also  the  matrix  is  synsnetric  and  idempotent  with  rank 

2 

n-m  and  hence  XX  b * n-m.  By  assumptions  of  normality  and  using  Sheppard's 
ij 

corrections,  we  have 

E(M1  - E(Mi))U  - 3oU  + fa2  * ^ 

for  all  i.  The  result  (3.12)  follows. 

From  leans  3.2,  we  know  that  the  distribution  of  (Regression  SSV°q 
aB  ^ "*  "»  a non-central  chi-squared  distribution  with  p degrees  of  freedom 
with  noncentrsfl.lt  jr  parameter 


W (3-15) 

to  the  order  of  approximation  implied  by  Sheppard's  corrections.  Further 


8 


since  Z b ? < n-m,  it  follows  from  (3- 11)  and  (3. 12)  that 
i 11 

(Error  SS)/(n-m)o0  tends  to  1 in  probability.  Hence  mFQ  as  given  in  (3-3 
has  a noncentral  chi-squared  distribution  with  noncentrality  parameter  6, 

Note  that  under  the  null  hypothesis  6*0,  hence  the  asymptotic 
distribution  of  FQ  is  central  chi-squared.  Therefore  the  test  can  be  easily 
performed. 

For  small  h,  the  distribution  of  F^  may  turn  out  to  be  close  to  the  F- 
distribution.  Box  and  Andersen  (1955)  have  developed  robust  tests  for  non- 

normal  populations  using  the  following.  Assume  that  the  distribution  of 

2 2 

(Errcr  SS)/(n-m'cQ  is  y^/v  where  degrees  of  freedom  v are  obtained  by  a method 
of  moments  given  by 

2 4 

v . n.m  . L_  L b 2 + Q(h_)  (3.16) 

4o  i a 

For  small  h/o,  we  have  hardly  any  correction  to  degrees  of  freedom  and  then 
the  test  can  be  performed  as  an  F test.  This  behavior  of  the  statistic  FQ 
has  been  studied  through  simulations  and  goodness  of  the  approximation  is 
measured  in  terms  of  Kolmogoroff- Smirnov  statistic  in  the  next  section. 
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Simulations 


Two  models  have  been  considered  for  simulations. 


Model  I. 


V ~N(a  + 0 } 


u — K “ 1,2,*..,K. 


Ue  consider  K = 10,  with  * 0,  ■ 1,...,X^q  ■ 10, 

sizes  for  every  h).  We  consider  two  cases  of  a,  0, 


"k 


100  (same  sample 


a » 10,  0*0 

a « 40  and  0*0. 

2 

The  values  of  a are  chosen  to  be 

25  or  100. 


The  size  of  h is  tiken  as 

0,  2,  3,  4,  5,  10,  15,  20. 

The  nypobhesis  considered  here  is 

Hq:  0=0 

Using  the  usual  IBM  Random  Number  Cenerator  package,  samples  were  generated 
and  then  were  grouped  in  inxervals  of  size  h.  The  F statistic  was  calculated 
for  the  ungrouped  data  case  and  FQ  statistic  for  the  various  grouped  data 
oases.  The  empirical  cumulative  distribution  of  the  statistics  were  then 
computed  and  were  compared  with  the  theoretical  F-distribution  using  Kolmogorov- 
Smirnov  distribution  Table  I and  II  describe  the  results  of  the  simulations  for 


10 


=»  10  and  n^  * 100  respectively.  The  last  column  gives  the  tail  probability 
for  significance  in  both  cases.  The  column  with  heading  D gives  the  actual 


value  of  Kolmogorov -Smirnov  statistic. 


Table  T 


h 

V1 

V2 

D 

P 

cr^«25,  a*10  - 

0 

i 

98 

0.6120 

0.8481 

2 

i 

98 

0.6311 

0.8206 

3 

i 

98 

O.6988 

0.8658 

4 

i 

98 

0.4953 

0.9669 

5 

i 

98 

O.853I 

0.4606 

10 

i 

98 

1.1812 

0.1227 

15 

i 

98 

0.8534 

0.4602 

20 

i 

98 

1.1241 

0.1597 

2 

r»-4n  . 

0 

1 

98 

0.9169 

0.3699 

2 

1 

98 

0.9339 

0.3477 

3 

1 

98 

0. 8668 

0.4401 

4 

1 

98 

0.6613 

0.7743 

5 

1 

98 

0.7018 

0.7082 

10 

1 

98 

0.9671 

0.3070 

15 

1 

98 

O.8695 

0.4362 

20 

1 

98 

1.2566 

0.0850 

2 

inn  0-in 

0 

1 

98 

0.6721 

0.7570 

2 

1 

98 

O.8685 

0. 4376 

3 

1 

98 

0.7632 

0.6050 

4 

1 

98 

0. 5060 

0.9600 

5 

1 

98 

0.5713 

O.8999 

10 

1 

98 

0.6226 

O.833O 

15 

1 

98 

0.7894 

0.5614 

20 

1 

98 

0.8204 

0.5114 

2 

0 

1 

98 

0.7245 

0.6701 

2 

1 

98 

0.7987 

0.5^63 

3 

1 

98 

0.8099 

0.5281 

4 

1 

98 

0. 8101 

0. 5278 

5 

1 

98 

0.7078 

0.6982 

10 

1 

98 

O.85I6 

0.4629 

15 

1 

98 

0.9647 

0.3098 

20 

1 

98 

0.6869 

0.7330 
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Table  IT 


h 

V1 

V2 

D 

P 

2 

__  __  _ _ _ _ _ _ -or  rt-in 

0 

1 

998 

0.8214 

0.5097 

2 

1 

998 

0.9069 

0.3832 

3 

1 

998 

0.8081 

0.5310 

4 

1 

998 

0.7360 

0.6507 

5 

1 

998 

0.8411 

0.4790 

10 

1 

998 

0.4508 

0.9871 

15 

1 

998 

0.9026 

O.3891 

20 

1 

998 

0.7345 

0.6532 

_ _ m mOK.  n-.il 

0 

1 

998 

0.5086 

0.9562 

2 

1 

998 

0.7267 

0.6664 

3 

1 

998 

0.6666 

0.7659 

4 

1 

998 

0.5544 

0.9184 

5 

1 

998 

0.6424 

0.8037 

10 

1 

998 

0. 5871 

0.8809 

15 

1 

998 

O.7458 

0.6342 

20 

1 

998 

1.1031 

0.1753 

- nn  rt-in 

0 

i 

998 

0.7605 

0.6096 

2 

i 

998 

0.7578 

o.6i4o 

3 

i 

998 

0. 7627 

0.6058 

4 

i 

998 

0. 9348 

0.3466 

5 

i 

998 

O.8685 

O.4376 

10 

i 

998 

0.6699 

O.7606 

15 

i 

998 

O.6582 

0.7792 

20 

i 

998 

0. 9366 

0. 3442 

« -inn 

0 

i 

998 

O.7652 

0.6017 

2 

i 

998 

0.7107 

0.693^ 

3 

i 

998 

O.7613 

0.6082 

4 

i 

998 

0.7483 

0.6300 

5 

i 

998 

0. 5413 

0.9313 

10 

i 

998 

0. 5698 

0.9015 

15 

i 

998 

0. 8149 

0.5201 

20 

i 

999 

0.5546 

0.9182 

I 


Model  II.  We  consider  here  the  regression  model 


*»*  * “Aa  * S3V  a ) 

k * 1,2, . . . ,4  . 


>ur  sets  of  X, 
ku 

are  utilized 

(i) 

Xll-° 

* 

1 

X13 

(ii) 

X2l"2 

X22  " 

4 

X23 

(ili) 

X31“7 

X32‘ 

5 

X33 

(lv) 

x4i  " 9 

X42  * 

1 

X43 

h - 0, 

10,  50, 

100,  200 

3 

6 

3 

8 


a - 100 

* 10,  25,  100 


(same  for  all  k) 


01  " P2 


e3  . 0 . 


The  hypothesis  tested  here  is 


Ho:  B2  • 63  - 0 


The  results  are  given  in  Table  III  and  are  based  on  100  samples. 


lU 


Table  III 


h 

V1 

V2 

D 

P 

r»  . If!  Oil  lr 

“k 

0 

2 

37 

0.7567 

0.6158 

10 

2 

37 

0.6984 

0.7138 

50 

1 

37 

0.7600 

0.6104 

100 

2 

37 

O.7239 

0.6712 

200 

2 

37 

1.2845 

0.0738 

-r  OR  oil  1r 

"k 

0 

2 

97 

0.8039 

0.5378 

10 

2 

97 

0. 87^2 

0.4294 

50 

2 

97 

0.8401 

0.4805 

100 

2 

97 

0.7487 

0.6293 

200 

2 

97 

O.9388 

0.3415 

- i ail  v 

"k 

0 

2 

397 

0.7705 

0.59^7 

10 

2 

397 

0.7468 

0.6325 

50 

2 

397 

0.87^9 

0.4284 

100 

2 

397 

1.1426 

0.1469 

200 

2 

397 

1.1312 

0. 1546 

15 


5.  Dincuaalon 


Comparison  of  the  tail  probability  of  the  Kolmogorov -Smirnov  statistic 
for  h * 0 (that  ia,  ungrouped  ease)  with  various  values  of  h,  shows  that  on  the 
whole,  the  empirical  cumulative  distributions  are  the  same  for  the  cases 
/o  < 1.  When  - > 1,  we  do  not  have  very  good  results.  In  general,  one  could 
make  the  statement  that  the  F-diatribution  ia  fairly  a good  approximation  to 
the  distribution  of  the  statistic  F^.  More  extensive  simulations  may  be  able 
to  provide  further  evidence  of  this  correspondence. 
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